奥地利专利AT521665A1 Grammar recognition

专利PDF首页>>奥地利专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The invention relates to a method for characterizing the state of a computer system, wherein - logs are generated by the computer system or by processes running on it, in that when predetermined events occur, a log line (L1, ..., L100) is created for each of these events and wherein the log line (L1, ..., L100) describes the event logged in each case, and - whereby each log line created in this way (L1, ..., L100) is divided into a number of substrings, based on the individual log lines (L1, ..., L100) and the sequence of the individual partial character strings contained in the protocol lines (L1, ..., L100) and due to the frequency of the occurrence of the protocol lines and the partial character strings in the protocol lines a syntax tree describing the possible sequence of partial character strings is created, and - whereby this syntax tree is characteristic of the state of the computer system is seen.
公开号:AT521665A1
申请号:T50461/2018
申请日:2018-06-11
公开日:2020-03-15
发明作者:Wurzenberger Markus；Landauer Max；Fiedler Roman；Florian Skopik Ddr
申请人:Ait Austrian Institute Tech Gmbh；
IPC主号:

专利说明:

Summary
The invention relates to a method for characterizing the state of a computer system, wherein
- Logs are created by the computer system or by processes running on it, in that when predetermined events occur, a log line (L1, L ₁₀₀ ) is created for each of these events and the log line (L1,
L ₁₀₀ ) describes the logged event, and
- Each protocol line (L1, L ₁₀₀ ) thus created is divided into a number of
Sub-strings are subdivided, whereby due to the individual log lines (L1,., L100) and the sequence of the individual sub-strings contained in the log lines (L1,., L100) and due to the frequency of occurrence of the log lines and the sub-strings in the log lines possible sequence of sub-strings describing syntax tree is created, and
- whereby this syntax tree is regarded as characteristic of the state of the computer system.
Fig. 1/45
The invention relates to a method for characterizing the state of a
Computer system. As a result, the invention provides a structured description of the
System state, which is used in particular for the detection of abnormal states in
Computer systems can be used.
Various methods are known from the prior art, with which it is possible to characterize the internal state of a computer system on the basis of certain meaningful parameters or data structures. The basic purpose of this procedure is to recognize abnormal operating conditions by obtaining characteristic parameters or data structures for a computer system. Such abnormal operating states can be, for example, special operating states in which the system is modified by a hacker attack or in which the system malfunctions due to unintentional changes. In addition, however, other deviating operating modes are also detected, which are associated, for example, with software updates or similar changes in the system of the computer system.
The present invention provides a further, easy-to-use and easy-to-use option for characterizing a computer system.
The invention is based on the basic intention to analyze individual protocol lines created by the computer system or the programs running on the computer system and to determine a grammar for these protocol lines by means of which the shape of the individual protocol lines can be characterized.
In contrast to grammars of computer languages, which are usually specified in advance, within the scope of the invention a grammar or a tree representing the grammar of the protocol lines is created on the basis of existing protocol lines. Subsequent log lines can then be checked to see whether they are considered grammatical in the sense of the grammar thus determined and whether they can be regarded as belonging to the normal system state.
/ 45
However, if the relevant protocol line deviates from the grammar, a situation has been found that cannot be brought into agreement with the previously determined normal operating state or with this with a certain probability. In this case, a system state can be detected that can be regarded as abnormal in the present sense. Such an abnormal operating state can be found in particular if a plurality or a number of newly determined protocol lines that exceeds a threshold value subsequently prove to be non-grammatical.
Another advantageous possibility of characterizing the system state and determining changes in the system state is to create a graph or syntax tree characterizing the grammar separately at different times or for different time windows and to examine these differences in the syntax trees thus created. If these syntax trees show significant differences, a deviation in the operating status can be determined.
The invention provides a method for characterizing the state of a computer system, in which
- Logs are created by the computer system or by processes running on it by creating a log line for each of these events when predetermined events occur, and the log line describing the respective logged event, and
- Each protocol line created in this way is divided into a number of substrings.
It is further provided that a syntax tree describing the possible sequence of sub-strings is created based on the individual protocol lines and the sequence of the individual sub-strings contained in the log lines and on the frequency of the occurrence of the log lines and the sub-strings in the log lines, and
- That this syntax tree is regarded as characteristic of the state of the computer system.
/ 45
This creates a simple and efficient possibility according to the invention
System state of the system can be easily characterized and kept available for further investigations.
To create a representation that advantageously describes the system behavior and also advantageously depicts the distribution of the frequency of the individual available protocol lines, the syntax tree is created using the individual protocol lines as an acyclic, directed graph,
the syntax tree has further nodes (N1, N2), each of which is assigned a pattern (P1, P2) which, when applied to one of the substrings, delivers a positive or negative match value,
- The syntax tree has individual directed edges that connect their respective source node with their respective target node if, under the individual protocol lines, the conditional probability that
- on the condition that partial character strings are contained in the protocol line (L1, L ₁₀₀ ), which according to their sequence with patterns of the individual nodes (N1, N11, N111) on one of the root node (N) of the syntax tree to the source node (N11) Directional partial path leading edge (e111) in accordance with the order of the nodes (N1, N11, N111) in each partial path deliver a positive match,
- As the next partial character string (s ₁₃ ) in the relevant protocol line (s1), there is a partial character string which has a positive match with the pattern stored in the target node (N111) of the relevant edge (e111), depending on the position in the syntax tree , Threshold exceeds, and
- In particular, the conditional transition probability is assigned to the edge in question.
In order to obtain a quick and easy check for changes in system behavior, it can be provided that a) a syntax tree, in particular according to a method according to one of the preceding ones, based on a number of predetermined protocol lines (L1,., L100) created within a first period of time Claims that is created / 45
b) the log lines generated by the computers or the processes running on these computers when predetermined events occur for each of these events are determined during a second period,
c) the parser is used to check whether and / or to what extent the protocol lines determined in step b) meet the rules specified by the syntax tree, and
d) an abnormal condition is found particularly when
- the number of those identified during the first period and those of
Protocol lines and rules that meet the specified rules
- The number of protocol lines determined during the second period and which comply with the rules specified by the syntax tree differ from one another by a predetermined amount.
An alternative way of recognizing changes in the system behavior provides that a syntax tree according to one of the preceding claims is determined for the same computer system at different times or for different systems with a similar structure and purpose.
- that deviations are searched for between the syntax trees thus created, and
- that in the event of deviations which exceed a predetermined threshold value, a deviating, in particular critical or abnormal, status of the computer system is reported.
It is particularly advantageous for storing a representation of the system behavior if the syntax tree is a rooted tree, preferably a rooted out tree.
One method for the efficient creation of a syntax tree provides that when the syntax tree is created, the individual substrings of the protocol lines are stored in a two-dimensional memory with two access indices, the first access index as the row index the log line and the second access index as the position index the position of the substring within of the respective protocol line indicates
- that for the substrings to which the lowest position index is assigned, / 45
a search is made for a number of patterns which describe the majority of the partial character strings,
- the probabilities are determined for the individual patterns that one of the substrings matches the pattern,
that a node of a first layer is inserted into the syntax tree for the individual patterns,
the respective pattern and those protocol lines are assigned to this node, the partial character strings used match the pattern of the node,
- This node is connected as the target node via a directed edge to the root node of the syntax tree, and
- this edge is assigned the respective previously determined probability, and
- that for incrementally increasing position index of the partial character strings in the log lines:
- separately for individual groups of protocol lines, each of which is assigned to a base node of the immediately preceding layer of the graph, in each case:
a number of patterns is searched which describe the majority of the partial character strings at the position determined by the respective position index,
- the probabilities are determined for the individual patterns that the respective partial character strings with the relevant position index match the pattern,
that a node of a layer corresponding to the position index is inserted in the graph for the individual patterns,
the respective pattern and those protocol lines are assigned to this node, the partial character strings used match the pattern of the node,
- This node is connected as a target node to the base node via a directed edge, and
- This edge is assigned the respective previously determined probability.
/ 45
The following rules are useful for creating syntax trees. It can be provided for the creation of nodes with fixed patterns that no pattern can be found for the partial character strings that has a relative frequency that exceeds a probability threshold value Θ1, a single node that has a pattern is inserted at the relevant point, that results in a match with each substring, in particular a probability of 1 being assigned to the respective newly created edge and / or all the protocol lines assigned to the source node of the edge being assigned to the newly inserted target node.
Alternatively or additionally, it can be provided that in the event that a pattern can be found for the partial character strings, so that the relative frequency of the partial character strings corresponding to the pattern exceeds a first probability threshold value Θ1,
- In the event that this relative frequency also exceeds the second probability threshold Θ2, a single node with the relevant pattern is inserted, in particular the portion of the substrings corresponding to the pattern being assigned to the respective newly created edge and / or all to the source node of the edge assigned protocol lines which contain a substring corresponding to the pattern at the relevant position, are assigned to the newly inserted target node, and / or
- In the event that this relative frequency does not also exceed the second probability threshold Θ2, a single node is inserted at the relevant location, which has a pattern that results in a match with each substring, in particular the respective newly created edge having a probability of 1 is assigned, all protocol lines assigned to the source node of the edge are assigned to the newly inserted target node.
Alternatively or additionally, it can be provided that in the event that a plurality of the patterns can be found for the partial character strings, so that the relative frequency of the partial character strings corresponding to the individual patterns individually exceeds a first probability value Θ1, / 45
- In the event that the sum of the relative frequencies of the patterns thus created also exceeds a third probability threshold value Θ3, a number of nodes are inserted, each with one of the patterns determined, in particular each node being connected to the original node via an edge, and / or the share of the partial character strings corresponding to the pattern in the total number of protocol lines assigned to the original node is assigned to the newly created edges and / or the individual protocol lines are divided among the nodes, so that each protocol line is assigned to the node whose pattern is its own considered sub-string corresponds, and / or
- In the event that the sum of the relative frequencies of the patterns created in this way does not exceed a third probability threshold value Θ3, a single node is inserted at the point in question, which has a pattern that results in a match with each substring, in particular the respective one newly created edge is assigned a probability of 1, and / or all of the protocol lines assigned to the source node of the edge are assigned to the newly inserted target node.
Alternatively or additionally, it can be provided that in the event that the number of those protocol lines that end at the position in question exceeds a predetermined fourth probability threshold Θ4 and that the number of those protocol lines that do not end at the position in question exceeds a predetermined fifth probability threshold value Θ5, the option of an immediate line end is added to the pattern of the node in question, and if a log line and a substring match the pattern, a match is considered to be given if the respective substring matches the pattern and
- The subsequent partial character strings of the protocol line also correspond to the subsequent patterns of the syntax tree, or
- the log line ends after this substring.
/ 45
An improved measure for creating specific trees adapted to the respective system state provides that the threshold values used in the creation of the syntax tree increase with increasing distance from the root node of the syntax tree or with increasing path depth in the syntax tree and / or the distance from the root node or the path depth be adjusted, in particular
- The first to fourth probability threshold values increase or decrease monotonically with increasing distance from the root node or progressing path depth in the syntax tree.
In order to be able to integrate a number of basic components of protocol lines, such as IP addresses, as sample modules in the check, it can be provided that the following are specified as samples:
- Predefined basic patterns, in particular IP addresses or other structured data, and / or
- Individual character strings defined during the creation of the syntax tree.
To adapt the syntax tree to changed states of the system, it can be provided that - for the individual paths, the conditional transition probability is formed over a predetermined number of time windows, and
- The further time course of the conditional transition probabilities of the individual edges formed in this way, in particular after the respective time window or a predetermined number of time windows, is examined to determine whether they fall below a predetermined probability threshold value and in this case
- The determined paths are deleted from the graph and / or
- Individual nodes of the graph are assigned variable parts of the protocol lines instead of unchangeable parts.
In order to be able to advantageously recognize when the syntax tree is to be changed, it can be provided that at a later point in time after the creation of the syntax tree there is a significantly high number of log lines which do not assign to any of the directed paths of the syntax tree created by nodes and edges / 45, a corresponding path that characterizes the individual protocol lines can be created in the syntax tree by modification for these protocol lines and, if necessary, the path probabilities assigned to the individual edges are adapted in this case to the newly occurring protocol lines.
In order to be able to easily recognize whether a newly created, additional log line represents the previous system status or whether it is new and therefore indicates the change in the system status, it can be provided that at least one additional log line is created by the computer system or by processes running on it and that a parser, which was created based on the syntax tree, is used to examine whether the further protocol line matches the syntax tree, a lack of agreement possibly being regarded as an indication of the existence of a different system state.
A computer program for carrying out a method according to the invention can preferably be stored on a data carrier.
Recording and pretreatment of log lines:
Specifically, some advantageous embodiments of the invention are shown in more detail using the present exemplary embodiments:
The following table shows an example of a number of protocol lines L1, ..., L100 for a better understanding of the functioning of the parser generator shown below. Since the log files used in practice have very different structures and usually have a considerable scope, only a log file with a reduced length is used here for demonstration purposes. The individual words used in the log files are abstracted by capital letters a, ..., z, each of the capital letters a, ..., z shown below corresponding to a specific word in the individual lines. Alternatively, there is also the possibility that one of the capital letters a, ..., z in the following table also stands for the use of a certain other pattern, for example, one of the letters for / 45 can also indicate the occurrence of an IP address or a time stamp in a specified format or another structured character string.
The protocol lines U, L ₁₀₀ shown in the table above were recorded at different times in the relevant computer system, the protocol lines U, ..., L ₁₀₀ thus created being broken down into a plurality of partial character strings beforehand using a tokenization algorithm. As part of the tokenization, different special or blank characters are used to separate the protocol lines L ^ ..., L ₁₀₀ into individual substrings. In the present case, spaces were used to separate the individual protocol lines into substrings to ensure the separation into individual substrings. However, it is easily possible to use other special characters or partial character strings with which protocol lines L ^ ..., L ₁₀₀ are typically divided into individual words. These are usually commas, tabs, brackets, semicolons or similar separators and special characters.
row Position 1 2nd 3rd 4th 5 Li Jul / 18/00: 00: 01 A E H J L2 Jul / 18/00: 00: 04 B E I. K L3 Jul / 18/00: 00: 05 C. F Gl ₄ Jul / 18/00: 00: 10 B E I. K L5 Jul / 18/00: 00: 18 A E H J Le Jul / 18/00: 00: 20 D E I. A l ₇ Jul / 18/00: 00: 21 B E I. K Ls Jul / 18/00: 00: 24 C. F GL9 Jul / 18/00: 00: 25 L M N 0 L10 Jul / 18/00: 00: 26 A E H J Lu Jul / 18/00: 00: 27 L M N 0 L12 Jul / 18/00: 00: 28 A E H J L13 Jul / 18/00: 00: 30 B E I. K L14 Jul / 18/00: 00: 31 A E H J L15 Jul / 18/00: 00: 32 B E I.L16 Jul / 18/00: 00: 33 D E I. B L17 Jul / 18/00: 00: 35 A E H J L18 Jul / 18/00: 00: 37 D E I. C. L19 Jul / 18/00: 00: 38 D E I. D L ₂ o Jul / 18/00: 00: 39 C. F HL2I Jul / 18/00: 00: 40 D E I. E L ₂ 2 Jul / 18/00: 00: 41 A E H J L23 Jul / 18/00: 00: 42 C. F I.
11/45
1-24 Jul / 18/00 00 44 B E I. K 1-25 Jul / 18/00 00 45 D X I. F 1-26 Jul / 18/00 00 46 C. F G1-27 Jul / 18/00 00 48 B E I. K 1-28 Jul / 18/00 00 49 D E I. G 1-29 Jul / 18/00 00 51 A E H J 1-30 Jul / 18/00 00 55 C. F H1-31 Jul / 18/00 00 58 A E H J I-32 Jul / 18/00 01 02 B E I. K L33 Jul / 18/00 01 03 D E I. H I-34 Jul / 18/00 01 05 C. F I.L35 Jul / 18/00 01 07 B E I. K l ₃₆ Jul / 18/00 01 09 C. F GI-37 Jul / 18/00 01 10th D E I. I. L38 Jul / 18/00 01 13 A E H J L39 Jul / 18/00 01 14 B E I.L40 Jul / 18/00 01 15 D E I. J L41 Jul / 18/00 01 16 A E H J L42 Jul / 18/00 01 17th D E I. K L43 Jul / 18/00 01 18th C. F I.L44 Jul / 18/00 01 19th C. F HL45 Jul / 18/00 01 20th C. F Hl ₄₆ Jul / 18/00 01 22 D Y I. L L47 Jul / 18/00 01 24th B E I. K I-48 Jul / 18/00 01 25th D E I. M L49 Jul / 18/00 01 27 A E H J L50 Jul / 18/00 01 28 B E I. K 1-51 Jul / 18/00 01 33 A E H J L52 Jul / 18/00 01 34 C. F GI-53 Jul / 18/00 01 36 L M N 0 L54 Jul / 18/00 01 39 D E I. N L55 Jul / 18/00 01 41 C. F GI-56 Jul / 18/00 01 42 D E I. 0 L57 Jul / 18/00 01 44 C. F GI-58 Jul / 18/00 01 46 A E GL59 Jul / 18/00 01 47 D E I. P L ₆ o Jul / 18/00 01 49 C. F w1-61 Jul / 18/00 01 50 B E I. K I-62 Jul / 18/00 01 52 A E H J I-63 Jul / 18/00 01 54 C. F I.I-64 Jul / 18/00 01 55 A E H J I-65 Jul / 18/00 01 57 B E I. K I-66 Jul / 18/00 01 59 D E I. Q I-67 Jul / 18/00 02 00 B E I. K I-68 Jul / 18/00 02 02 C. F XI-69 Jul / 18/00 02 04 A E H J
12/45
L70 Jul / 18/00 02 06 A E H J L71 Jul / 18/00 02 07 B E I. K L72 Jul / 18/00 02 08 D E I. R L73 Jul / 18/00 02 10th C. F GL74 Jul / 18/00 02 11 L M N O L75 Jul / 18/00 02 13 D E I. S L76 Jul / 18/00 02 14 A E H J L77 Jul / 18/00 02 16 D E I. T L78 Jul / 18/00 02 19th A E H J L79 Jul / 18/00 02 22 C. F YLso Jul / 18/00 02 23 B E I. K 1-81 Jul / 18/00 02 25th A E H J I-82 Jul / 18/00 02 26 C. F I.I-83 Jul / 18/00 02 28 B E I.I-84 Jul / 18/00 02 29 C. F GI-85 Jul / 18/00 02 30th D Z. I. U I-86 Jul / 18/00 02 32 B E I.I-87 Jul / 18/00 02 33 A E H J I-88 Jul / 18/00 02 36 D E I. V I-89 Jul / 18/00 02 37 B E I. K L ₉₀ Jul / 18/00 02 39 C. F GL91 Jul / 18/00 02 41 C. F Z.L ₉₂ Jul / 18/00 02 43 B E I. K L93 Jul / 18/00 02 46 A E H J L94 Jul / 18/00 02 47 B E I. K L95 Jul / 18/00 02 48 C. F I.L96 Jul / 18/00 02 51 B E I. K L97 Jul / 18/00 02 52 D E I. w L98 Jul / 18/00 02 53 A E H J L99 Jul / 18/00 02 55 B E I. K L100 Jul / 18/00 02 59 D E I. X
In the present exemplary embodiment, each protocol line U, L ₁₀₀ begins with a single time stamp which indicates the time at which the respective protocol line U,
L ₁₀₀ describes. This structure of protocol lines is basically common for L _100, but within the scope of the invention it is not absolutely necessary to choose this structure as a whole.
For the individual nodes of the syntax trees, fixed substrings can be specified as a pattern, which the respective substrings have to correspond to.
13/45
In addition, there is also the option of not specifying a substring at a specific position and accordingly allowing a node with variable content. In addition, a pattern can be specified for the individual nodes used in the creation of syntax trees, for example by means of a regular expression. This can be, for example, an IP address or a date stamp in a specific - more or less abstract - date format, or restricted alphabets, such as only numbers and periods.
The automated creation of syntax trees can be carried out in different ways, the exemplary embodiment of the invention shown in the present case representing a particularly resource-saving, in particular time and memory efficient, procedure. After the tokenization or division of the individual protocol lines L1,..., L ₁₀₀ into partial character strings, the protocol lines are viewed in columns.
In this context, two indices are assigned to each individual substring, namely a row index that designates the relevant protocol line L1, ..., L ₁₀₀ in which the substring is located, and a position index that indicates the position of the substring within the respective Protocol line L1, ..., L ₁₀₀ indicates.
For example, the first line of the log, i.e. the protocol line L1 with the line index 1, which reads as follows:
JUL / 18/00: 00: 01 A E H J
The protocol line L1 contains four spaces, so it is divided into five substrings separated by the spaces. The individual substrings are now the time stamp jul / 18/00: 00: 01 and the four subsequent letters a, e, h and j used to abstract words. The other log lines are divided into substrings in the same way.
Efficient procedure for creating a syntax tree:
/ 45
In the following procedure, the individual are saved in a table
Log lines viewed in columns.
In a first step, all sub-strings of protocol lines are considered whose position index has the value 1. In the present exemplary embodiment, these are the individual partial character strings in which the time stamp of the respective protocol lines is contained.
In the course of an analysis of the individual partial character strings with position index 1, it is established that they all follow a previously known and predetermined pattern, namely the following, predetermined predetermined, regular expression and that the individual partial character strings can be represented by a node which is followed by the following pattern is assigned:
(Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec) / (0 [1-9] | [12] [0-9] | 3 [0-1]) / (2 [0-3] | [01] [0-9]): [0-5] [0-9]: [0-5] [0-9]
As part of the check of the individual partial character strings of the first column, matches with this pattern are recognized for all of the determined protocol lines. The probability that a log line as the first substring contains a time stamp that corresponds to the pattern thus defined is therefore 1.
The syntax tree to be created contains a root node that symbolizes the start of the syntax check by the parser. Since in the present case the probability that the protocol lines have a partial character string corresponding to the pattern in the first place, a first probability threshold Θ1 = 0.1 and a second
Probability threshold Θ2 = 0.95, a single node N1 is inserted in the syntax tree. This is - as the target node - connected to the root node N ₀ via a directed edge e1. Based on the present example, a rule R2a for the recursive creation of syntax trees has already been shown.
/ 45
As can be seen in this example, a node, in this case the root node, is assumed, to which all protocol lines are assigned. Under these protocol lines, statistical measures are used to search for as few patterns as possible that have as large a number of the first substring as possible. In the present case, this could be found in a very simple manner, since the condition defined by the pattern applies to all the first partial character strings of the protocol lines.
To characterize the relative frequency with which the individual protocol lines or their first partial character string correspond to the pattern assigned to the node, the edge e1 is assigned the value 1 or 100%. The node N1 created in the first step is assigned to a first layer Y1 of the syntax tree.
In the second step, the second column, which contains all the partial character strings of the protocol lines whose position index is 2, is considered. The table above contains a total of five different sub-strings. It should be noted that the partial strings a, b, c, d are in second place in 24 of the protocol lines under consideration, while the partial character string l only in second in four protocol lines, namely in the protocol lines L9, L11, L53 and L ₇₄ Job stands. Since the log file has a total of 100 lines, the probability or relative frequency of occurrence of the individual substrings can now be determined in more detail. In the present exemplary embodiment, the relative frequencies for the occurrence of partial character strings a, b, c, d are respectively 0.24 and 24%, the probability for the occurrence of partial character string I in second place is 0.04 or 4%.
For the structure of a grammar tree, it must be examined below which of the substrings occur in the log lines with sufficient frequency that they can be transferred to the syntax tree.
It can be determined that there are several different substrings a, b, c, d, the relative frequencies of which exceed a predetermined second threshold value Θ1, while the relative frequency of the substring l do not exceed this threshold value Θ1. This threshold value Θ1 is set in the present exemplary embodiment with θι = 0.1. The partial character string l, the relative frequency of which does not exceed the threshold value θ1, is not considered in the following and is not further considered for the creation of the syntax tree.
The sum of those relative frequencies for the partial character strings a, b, c, d which each exceed the first threshold value is subsequently determined. In the present case, the sum of these relative frequencies for the partial character strings a, b, c, d results in a relative frequency of 0.24 + 0.24 + 0.24 + 0.24 = 0.96. This value exceeds a third probability threshold value θ ₃ , which in the present exemplary embodiment has a value of θ ₃ = 0.9. Since a plurality of partial character strings have thus occurred in the present case, the individual relative frequencies of which exceed a first probability threshold value θ1 and the summed relative frequency of which exceed a third probability threshold value θ3, a plurality of nodes are inserted when creating the syntax tree, each of which is assigned a pattern, which the substrings correspond to. The individual nodes N 11 N12 N13 N14 are connected to the previous node N1 as the target node via directed edges.
Using this procedure, a rule R3a was shown in more detail: In the event that a number of protocol lines exceeding a second probability threshold value, the first substrings of which match the individual patterns of the partial path covered up to a certain base node, i.e. up to node N1 of the syntax tree the subsequent, here second, partial character string, a number of patterns can be matched, whereby for each individual pattern a, b, c, d a number of protocol lines exceeding a first threshold value can also be matched and the total number of these protocol lines exceeds a third threshold, for each of the patterns a, b, c, d a separate node N11, N ₁₂ , N ₁₃ , N ₁₄ containing this pattern a, b, c, d is created, and these nodes with the base node is connected via a directed edge e11, e ₁₂ , e ₁₃ , e ₁₄ .
Each node N ^N 12 ^N 13 ^N 14 is assigned a pattern to which the respective partial character strings correspond, ie the syntax tree then delivers a positive / 45 when checking a protocol line when processing the second partial character string
Match value if at the second position or the position with the
Position index 2 is the substring a, b, c or d.
In the present exemplary embodiment, the individual edges are again assigned those probability values or relative frequency values that indicate how likely it is for a protocol line that is assigned to the preceding node N1 that it contains a partial character string in its second position that pattern associated with the node.
To make it easier to understand the remaining probability, an additional edge f _15, shown in broken lines, has been shown in FIG. 1, which leads to a node M15 which is not contained in the syntax tree and which stands for the total of four occurrences of the partial character string 1 in the protocol lines. Likewise, the frequency value 0.04 with which a partial character string 1 occurs at the relevant position is assigned to the relevant edge connecting the node with the root node only for illustration purposes. Although this node is based on the probability distribution of the individual second partial character strings, it is not entered in the syntax tree and is also not used to characterize the computer system.
While no distinction was possible due to the first partial character strings of the protocol lines and all protocol lines were assigned to node N1, the protocol lines are now distributed to nodes N11, N ₁₂ , N13, N ₁₄ created in the second step for further processing or for further creation of the syntax tree . Those protocol lines L9, L11, L53 and L74 which do not correspond to any of the patterns assigned to the nodes N11, N ₁₂ , N13, N ₁₄ are not used for further processing or are assigned to the node M15 which does not belong to the final syntax tree for illustration purposes or no longer processed further. These are the protocol lines that have the substring l in the second position.
The nodes N11, N12, N13, N14 created in the second step are assigned to a second layer Y2 of the syntax tree.
/ 45
In a third step, the partial character strings are now considered which are in third place in the respective protocol line or whose position index is 3. When processing the partial character strings, which are in third place in the individual protocol lines, the individual partial character strings are processed further separately after the assignment of their respective protocol lines to one of the nodes N11, N12, N ₁₃ , N14. The individual sub-strings are processed separately on a node-by-node basis, ie those sub-strings that have sub-string a in second place are treated independently of those protocol lines that have sub-string b in second place, etc.
The previously described calculations of the relative frequencies, also referred to as path frequencies, are now carried out again. Since among the partial character strings that have a time stamp in the first place, the partial character string a in the second place, all the character strings have the partial character string e in third place, the relative frequency among these protocol lines is the same for the fact that they have the partial character string e in third place 1. Since this relative frequency exceeds the first and second probability threshold values Θ1, Θ2 and there are no other substrings for which this is the case, only a single further node N111 is directed into the syntax tree via a newly created, according to rule R2a Edge e111 inserted. This means that the syntax tree at the relevant point only contains one option for log lines. The directional edge e111 thus created is assigned the probability value 1, and all the protocol lines of the node N11 are assigned to the newly created node N111.
If one now looks at the protocol lines which have a time stamp in the first position and the partial character string b in the second position, it can be seen that all of these protocol lines have an i in third position.
Among those protocol lines which have a time stamp in the first place and the partial character string c in the second place, only partial character strings follow which have the partial character string f in the third place.
/ 45
In accordance with rule R2a, a single node N ₁₂₁ with a path frequency 1 can therefore be arranged downstream of the node N ₁₂ via a newly inserted edge e ₁₂₁ , to which all of the protocol lines of the node N _{12 are} assigned.
Likewise, in accordance with rule R2a, the node N _{12 can be followed} by a single node N ₁₂₁ with a path frequency 1 via a newly inserted edge e ₁₂₁ , to which all of the protocol lines of the node N _{12 are} assigned.
If one then considers those log lines of the node N ₁₄ whose first position contains a time stamp and the second place contains a partial character string d, it can be seen that of these 24 log lines exactly 21 log lines have an e in third place, which remaining three lines, however, either an x, a y or a z. The relative frequency of the partial character string e thus has a value of 21/24 = 0.876, the relative frequency of the other partial character strings x, y, z has the value 0.04167. Therefore, the relative frequency of the substring e is the only relative frequency that exceeds the first probability threshold Θ1.
However, the relative frequency of the partial character strings e is below θ ₂ , so that in the following, due to the rule R2b, no separate node is formed for the individual partial character strings e, x, y, z, rather a node N _{141 is} created which contains a variable pattern, that matches any substring. The node N ₁₄₁ is connected via the edge e ₁₄₁ as the destination node to the node N14. The protocol lines of node N14 are assigned to node N ₁₄₁ .
Variable pattern nodes are used to represent parts of log lines that are subject to frequent change. This prevents the syntax tree from growing too quickly, which could make the syntax tree extremely complex. In addition, such variable nodes are crucial in order not only to be able to parse the protocol lines used, but also to parse new and unknown protocol lines in a meaningful way. This is because parts of the log lines that occur in the input data with a high variability, that is, exclusively or almost exclusively different and non-repeating character strings, for example a continuously increasing line index, presumably also in the unknown data with a similar variability occur but do not include exactly the same characters. In the syntax tree, nodes that contain such variable patterns are shown as pentagons.
/ 45
The nodes Nm, N ₁₂ i, N ₁₃₁ , N ₁₄₁ created in the third step are assigned to a third layer Y _{3 of} the syntax tree.
Now that the third column of the above table of protocol lines has been processed, the substrings with position index 4 are treated in a fourth step, i.e. which are in fourth place in the table of log lines.
If there is a time stamp in the first place in some of the protocol lines, the partial character string a in the second place and the partial character string e in the third place, the relative frequency with which partial character strings occur in the fourth place is now examined. In principle, several substrings g, h are recognizable as potential successors. The relative probability for the occurrence of the substring g is 1/24 = 4.17%, the relative probability for the occurrence of the substring h is 23/24 = 95.83%. The relative probability of the occurrence of the partial character string h thus exceeds the first and second probability threshold values Θ1, θ ₂ . In this case, according to rule R2a, only a single node N1111 is inserted with a pattern that only matches the substring h, but not the substring g. The edge connecting node N111 to node N1111 is assigned a relative probability of 95.83%. All the protocol lines assigned to the node N111 are assigned to the node N1111, the fourth character string of which contains the substring h. 1 also shows the node M ₁₁₁₂ , which is not contained in the syntax tree and which symbolizes the transition probability for the partial character string g. The one partial character string L ₅₈ is assigned to this node and / or is no longer used for creating the syntax tree.
Since among the substrings that have a time stamp in the first place, the substring b in the second place and e in the third place, all the character strings have the part string i in fourth place, the relative frequency among these protocol lines is that they are in fourth place Partial character string I have, equal to 1. Since this relative frequency exceeds the first and second probability threshold values Θ1, Θ2 and there are no other partial character strings for which this is the case, according to rule R2a, only a single further node N ₁₂₁₁ becomes in the syntax tree inserted over a newly created, aligned edge e ₁₂₁₁ . This means that the syntax tree at the relevant point only contains one option for protocol lines / 45. The directional edge e ₁₂₁₁ thus created is assigned the probability value 1, and all the protocol lines of the node N ₁₂₁ are assigned to the newly created one
Node _N1211 assigned.
If one looks at the protocol lines which have a time stamp at the first position, a partial character string c at the second position and a partial character string f at the third position, several partial character strings can be identified as potential successor nodes at the fourth position. The path frequencies for the occurrence of the sub-strings in the fourth position are as follows: g: 0.4167; h: 0.167; i: 0.25; W, X, Y, Z: 0.04167.
This means that the relative probabilities for the occurrence of the substrings g, h and i exceed the first probability threshold Θ1, but the relative probabilities for the occurrence of the substrings w, x, y and z do not. Since several probability values for the occurrence of partial character strings exceed the first probability threshold value Θ1, one of the rules R3a, R3b applies.
The sum of the path frequencies of the partial character strings whose relative probability exceeds the first probability threshold Θ1 is 0.4167 + 0.167 + 0.25 = 0.8337 and therefore does not exceed the third probability threshold θ ₃ . For this reason, no separate nodes are created for the partial character strings g, h and i here, but a variable node with a pattern that corresponds to any partial character string.
Subsequently, those protocol lines which are assigned to the node N ₁₄₁ are processed further, ie these are protocol lines which have a time stamp at the first position, the sub-character string d at the second position and any sub-character string at the third position . It turns out that all these lines have the substring i in the fourth position. In a similar way to that described for other nodes, a single node N ₁₄₁₁ with a pattern corresponding to the partial character string i is created in accordance with rule R2a.
The nodes N1111, N1211, N1311, N1411 created in the fourth step are assigned to a fourth layer Y _{4 of} the syntax tree.
In a fifth step, the individual protocol lines are examined for their substrings at the fifth position.
/ 45
There are only the protocol lines assigned to node N1111
Protocol lines, in the fifth position of which only partial character strings j are contained.
According to rule R2a, a single node with a pattern is inserted into the syntax tree that only matches the substring j.
However, it can also be seen in the log lines that any partial character string follows i. Each of the substrings is unique, so the path frequency of each substring is 0.04167. None of these path frequencies exceeds the first probability threshold Θ1 and thus case 1 (rule R1) occurs. In this case, a variable node is created for the reasons already mentioned.
If one now looks at the protocol lines of the node N ₁₂₁₁ , which have a time stamp in the first position, an b in the second position, an e in the third position and an i in the fourth position, one can see that not all protocol lines have further subsequent substrings , but end at this point (protocol lines L ₁₅ , L39, L83, L ₈₆ ). Equally, only 20 of the 24 protocol lines that have i in fourth place have sub-character string k in fifth place. Since the proportion of the ending log lines is 4/24 = 0.167 and exceeds the fourth probability threshold θ ₄ = 0.01, all subsequent nodes are considered optional in accordance with R4. Since the proportion of the protocol lines that have the partial character string i in fourth position and the partial character string k in fifth position is 20/24 = 0.83 and exceeds the fifth probability threshold Θ5 = 0.01, the following optional nodes are rule R4 possible. In order to decide whether a variable node or one or more fixed nodes are to be formed, the original three rules are now checked, the base value being 20, the number of non-ending lines. Since the lines that have the substring i in the fourth position and the substring k in the fifth position, therefore occur with a frequency of 20/20 = 1 and thus exceed both θ ^ β and θ _2. Rule R2a is used and a fixed node N _{12111 is} formed. In the representation of the syntax tree, this fact is characterized in that the node i is marked as an octagon. This means that a protocol line corresponding to the syntax tree either corresponds to N ₁₂₁₁ in this node and ends immediately afterwards, or it also fulfills all subsequent N ₁₂₁₁₁ nodes.
Since the protocol lines assigned to nodes N ₁₃₁₁ , N ₁₄₁₁ do not have any partial character strings in the fifth position, the structure of the syntax tree for these nodes is complete and no new edges or nodes are inserted.
/ 45
Since the protocol lines assigned to the nodes N _11r11 , N ₁₂₁₁₁ _likewise none
If you have sub-strings in the sixth position, the structure of the syntax tree is also completed for these nodes and no new edges or nodes are inserted.
The nodes N11111, N ₁₂₁₁₁ created in the fifth step are assigned to a fifth layer Y5 of the syntax tree.
Rules for creating the syntax tree:
Regarding the substrings to be analyzed, the rules can be summarized as follows:
Rule R1: If it is not possible to find a pattern for the substrings that has a relative frequency that exceeds a probability threshold Θ1, a single node is inserted at the point in question that has a pattern that matches each substring. The newly created edge is assigned a probability of 1. All log lines assigned to the source node of the edge are assigned to the newly inserted target node.
Rule R2: If a single pattern can be found for the partial character strings so that the relative frequency of the partial character strings corresponding to the pattern exceeds a first probability threshold Θ1, a distinction is made as follows:
Rule R2a: If this relative frequency also exceeds the second probability threshold Θ2, a single node with the relevant pattern is inserted. The share of the substrings corresponding to the pattern is assigned to the respective newly created edge. All protocol lines assigned to the source node of the edge, which contain a substring corresponding to the pattern at the relevant position, are assigned to the newly inserted target node.
/ 45
Rule R2b: If this relative frequency does not also exceed the second probability threshold θ ₂ , a single node is inserted at the location in question, which has a pattern that results in a match with each substring. The newly created edge is assigned a probability of 1. All log lines assigned to the source node of the edge are assigned to the newly inserted target node.
Rule R3: If a plurality of the patterns can be found for the partial character strings, so that the relative frequency of the partial character strings corresponding to the individual patterns individually exceeds a first probability value θ1, a distinction must be made as follows:
Rule R3a: If the sum of the relative frequencies of the patterns created in this way also exceeds a third probability threshold value θ3, a number of nodes are inserted, each with one of the patterns determined, each node being connected to the original node via an edge. The newly created edges are assigned the proportion of the partial character strings corresponding to the pattern in the total number of protocol lines assigned to the original node. The individual protocol lines are divided between the nodes, so that each protocol line is assigned to the node whose pattern corresponds to the sub-character string under consideration.
Rule R3b: If the sum of the relative frequencies of the patterns created in this way does not exceed a third probability threshold value θ3, a single node is inserted at the point in question, which has a pattern that corresponds to each substring. The newly created edge is assigned a probability of 1. All log lines assigned to the source node of the edge are assigned to the newly inserted target node.
Rule R4: If the number of log lines that end at the position in question exceeds a predefined fourth probability threshold value θ4 and the number of those log lines that do not end in the position concerned exceeds a predefined fifth probability threshold value θ ₅ , the node in question can have a pattern can be added with an optional immediate end. In this case / 45 when checking a log line and a substring on
Matching the pattern A match is considered to be given if the respective substring matches the pattern and
- The subsequent partial character strings of the protocol line also correspond to the subsequent patterns of the syntax tree, or
- the log line ends after this substring.
The basic value for deciding whether a variable node or one or more fixed nodes is created is the number of log lines that does not end. To decide which nodes should be created, the first three rules are used, with the basic value just described.
Check for abnormal conditions:
Two options for checking whether there is an abnormal state in the computer system that creates the protocol lines are shown below. In the first option, changes in the relevant syntax tree are examined. In the simplest case, a syntax tree is created separately for two non-identical periods in the manner described above. Two syntax trees are obtained, which should have essentially the same shape over time until the system functions identically. In particular, if sufficiently long periods of time are selected for the examination in which the occurrence of individual types of log lines can be predicted with a sufficiently high probability or in a sufficiently large number, the two syntax trees created should have essentially the same structure.
In this case, the trees can be compared with one another, and a comparison measure can be determined on the basis of the comparison, which indicates how strongly the two system states differ from one another. The individual probabilities used when creating the syntax tree can also be used to determine the measure of conformity.
Basically, different methods can be used to compare two syntax trees.
/ 45
An advantageous method is, for example, to subsequently parse the log lines that were used to create one syntax tree using the other syntax tree. If this is also done the other way round, the overlaps in the assignments of the protocol lines can be used as a measure of similarity. Such measures of similarity are, for example, the F-Score or the Rand Index (Introduction to Information Retrieval, Manning, Christopher D. and Raghavan, Prabhakar and Schütze, Hinrich, Cambridge University Press, 2008).
Another preferred variant of checking whether there is an abnormal system state is to have the individual protocol lines checked by a parser whose functionality is determined by the syntax tree or who checks whether the syntax tree matches. Since those protocol lines that were used to create the syntax tree correspond to this syntax tree with a high probability P1 due to the procedure described above, this is not necessarily the case for protocol lines created later. If the probability P2 that the log lines created during another, in particular subsequent, time period correspond to the syntax tree is significantly reduced or reduced by a predetermined threshold value compared to the probability P1, an abnormal system state can be detected.
To check whether a log line corresponds to the syntax tree, the log line is examined in its substrings. The partial character strings which are obtained in the protocol line L1, L ₁₀₀ are compared according to their position in the protocol line with individual patterns of the nodes N1, N11, N111, ... of the syntax tree in the order given by them. If a match is found between the partial character strings and the nodes on a directional partial path of the syntax tree or their patterns, the protocol line corresponds overall to the grammar specified by the syntax tree. If, on the other hand, no single directed subpath can be found in the syntax tree to which the protocol line corresponds, this does not correspond to the grammar specified by the syntax tree and therefore represents an abnormal state.
/ 45
Modification of the syntax tree:
The syntax tree does not necessarily have to be completely redetermined for individual periods or periods. There is also the option of successively adapting the syntax tree in order to avoid a new determination of the syntax tree, for example on a daily basis. The path probabilities specified when creating the syntax tree can be successively adjusted using the newly created protocol lines. The conditional transition probabilities for the individual edges are updated over a predetermined number of time windows.
The case may arise that the conditional transition probabilities of the individual edges formed in this way, in particular after the respective time window or a predetermined number of time windows, fall below a predetermined probability threshold value. This fact can be examined at predetermined time intervals. In this case, the following adaptations of the syntax tree can be made.
- Individual edges or directional paths whose probability has dropped below a certain threshold are deleted from the syntax tree. If a node is deleted from the syntax tree, all subsequent nodes and edges of the syntax tree are also deleted.
- The pattern of nodes to which a predetermined, not arbitrary, pattern is assigned is replaced by any pattern.
If, at a later point in time after the syntax tree has been created, there is a significantly high number of log lines that are not assigned to any of the directed paths of the syntax tree created by nodes and edges, a corresponding path that characterizes the individual log lines can be found in the syntax tree for these log lines be created by modification. In this case, the path probabilities assigned to the individual edges are adapted to the newly occurring log lines. In order for this to happen, the conditions previously used must be met again.

权利要求:
Claims (16)
[1]
Claims:
1. A method of characterizing the state of a computer system, wherein
- Logs are created by the computer system or by processes running on it by creating a log line (L1, L ₁₀ O) for each of these events when predetermined events occur, and the log line (L1,
L ₁₀₀ ) describes the logged event, and
each protocol line (L1, ..., L ₁₀₀ ) is divided into a number of partial character strings, characterized in that
- that due to the individual protocol lines (L1, ..., L ₁₀₀ ) and the sequence of the individual substrings contained in the protocol lines (L1, ..., L ₁₀₀ ) and due to the frequency of occurrence of the protocol lines and the substrings in the Protocol lines a syntax tree describing the possible sequence of substrings is created, and
- That this syntax tree is regarded as characteristic of the state of the computer system.
[2]
2. The method according to claim 1, characterized in that the syntax tree is created on the basis of the individual protocol lines as an acyclic, directed graph,
the syntax tree has further nodes (N1, N ₂ ), each of which is assigned a pattern (P1, P2) which, when applied to one of the substrings, delivers a positive or negative match value,
- The syntax tree has individual directed edges that connect their respective source node with their respective target node if, under the individual protocol lines, the conditional probability that
- on the condition that partial character strings are contained in the protocol line (L1, ..., L ₁₀₀ ), which, according to their sequence, contain patterns of the individual nodes (N1, N11, N111) on one from the root node (N) of the syntax tree to the source node (N11) of the directed partial path leading edge (e111) in accordance with the order of the nodes (N1, N11, N111) in this partial path each provide a positive match,
29/45
- As the next partial character string (s ₁₃ ) in the relevant protocol line (s1), there is a partial character string which has a positive match with the pattern stored in the target node (N111) of the relevant edge (e111), depending on the position in the syntax tree , Threshold exceeds, and
- In particular, the conditional transition probability is assigned to the edge in question.
[3]
3. The method according to claim 1 or 2, characterized in that
a) on the basis of a number of predetermined protocol lines (L1, L ₁₀₀ ) or those created within a first time period, a syntax tree, in particular after a
Method according to one of the preceding claims, is created,
b) the log lines generated by the computers or the processes running on these computers when predetermined events occur for each of these events are determined during a second period,
c) the parser is used to check whether and / or to what extent the protocol lines determined in step b) meet the rules specified by the syntax tree, and
d) an abnormal condition is found particularly when
- the number of those identified during the first period and those of
Protocol lines and rules that meet the specified rules
- The number of protocol lines determined during the second period and which comply with the rules specified by the syntax tree differ from one another by a predetermined amount.
[4]
4. The method according to any one of the preceding claims, characterized in
a syntax tree according to one of the preceding claims is determined for the same computer system at different times or for different systems with a similar structure and purpose.
- that deviations are searched for between the syntax trees thus created, and
30/45
- In the event of deviations that exceed a predetermined threshold value, a deviating, in particular critical or abnormal, status of the computer system is reported.
[5]
5. The method according to any one of the preceding claims, characterized in that the syntax tree is a rooted tree, preferably a rooted out tree.
[6]
6. The method according to any one of the preceding claims, characterized in
that the individual partial character strings of the protocol lines are stored in a two-dimensional memory with two access indices when the syntax tree is created, the first access index indicating the protocol line as the line index and the second access index indicating the position of the partial character string within the respective protocol line as a position index,
- that for the substrings to which the lowest position index is assigned,
a search is made for a number of patterns which describe the majority of the partial character strings,
- the probabilities are determined for the individual patterns that one of the substrings matches the pattern,
that a node of a first layer is inserted into the syntax tree for the individual patterns,
the respective pattern and those protocol lines are assigned to this node, the partial character strings used match the pattern of the node,
- This node is connected as the target node via a directed edge to the root node of the syntax tree, and
- this edge is assigned the respective previously determined probability, and
- that for incrementally increasing position index of the partial character strings in the log lines:
31/45
- separately for individual groups of protocol lines, each of which is assigned to a base node of the immediately preceding layer of the graph, in each case:
a number of patterns is searched which describe the majority of the partial character strings at the position determined by the respective position index,
- the probabilities are determined for the individual patterns that the respective partial character strings with the relevant position index match the pattern,
that a node of a layer corresponding to the position index is inserted in the graph for the individual patterns,
the respective pattern and those protocol lines are assigned to this node, the partial character strings used match the pattern of the node,
- This node is connected as a target node to the base node via a directed edge, and
- This edge is assigned the respective previously determined probability.
[7]
7. The method according to any one of the preceding claims, that in the event that no pattern can be found for the substrings that has a relative frequency that exceeds a probability threshold value Θ1, a single node that has a pattern is inserted at the relevant location , which results in a match with each partial character string, in particular a probability of 1 being assigned to the respective newly created edge and / or all the protocol lines assigned to the source node of the edge being assigned to the newly inserted target node.
[8]
8. The method according to any one of the preceding claims, characterized in that in the event that a pattern can be found for the partial character strings, so that the relative frequency of the partial character strings corresponding to the pattern exceeds a first probability threshold value Θ1,
32/45
- In the event that this relative frequency also exceeds the second probability threshold θ ₂ , a single node with the relevant pattern is inserted, in particular the portion of the substrings corresponding to the pattern being assigned to the respective newly created edge and / or all to the source node of the Edge-assigned protocol lines that contain a substring corresponding to the pattern at the relevant position, are assigned to the newly inserted target node, and / or
- In the event that this relative frequency does not also exceed the second probability threshold θ ₂ , a single node is inserted at the relevant point, which has a pattern that results in a match with each substring, in particular the respective newly created edge a probability of 1 is assigned, all the protocol lines assigned to the source node of the edge are assigned to the newly inserted target node.
[9]
9. The method according to any one of the preceding claims, characterized in that in the event that a plurality of the patterns can be found for the partial character strings, so that the relative frequency of the partial character strings corresponding to the individual patterns individually exceeds a first probability value θ1,
- In the event that the sum of the relative frequencies of the patterns created in this way also exceeds a third probability threshold value θ ₃ , a number of nodes are inserted, each with one of the patterns determined, in particular each node being connected to the original node via an edge in each case and / or the newly created edges are assigned the proportion of the partial character strings corresponding to the pattern in the total number of protocol lines assigned to the original node and / or the individual protocol lines are divided among the nodes, so that each protocol line is assigned to the node whose pattern is its corresponds in each case to the partial character string considered, and / or
- In the event that the sum of the relative frequencies of the patterns created in this way does not exceed a third probability threshold value θ3, a single node is inserted at the point in question, which has a pattern that results in a match with each substring, in particular the respective one a newly created edge is assigned a probability of 1, and / or all the protocol lines assigned to the source node of the edge are assigned to the newly inserted target node.
33/45
[10]
10. The method according to any one of the preceding claims, characterized in that in the event that the number of those protocol lines that end at the relevant position exceeds a predetermined fourth probability threshold Θ4, and the number of those protocol lines that do not end at the relevant position , exceeds a predetermined fifth probability threshold value Θ5, the option of an immediate line end is added to the pattern of the node in question, and when checking a protocol line and a substring for conformity with the pattern, a match is considered to be given if the respective substring matches the pattern matches and
- The subsequent partial character strings of the protocol line also correspond to the subsequent patterns of the syntax tree, or
- the log line ends after this substring,
[11]
11. The method according to any one of the preceding claims, characterized in that the threshold values used in the preparation of the syntax tree increase with increasing distance from the root node of the syntax tree or with increasing path depth in the syntax tree and / or the distance from the root node or the path depth are adjusted, whereby especially
- The first to fourth probability threshold values increase or decrease monotonically with increasing distance from the root node or progressing path depth in the syntax tree.
[12]
12. The method according to any one of the preceding claims, characterized in that the following are specified as samples:
- Predefined basic patterns, in particular IP addresses or other structured data, and / or
- Individual character strings defined during the creation of the syntax tree.
34/45
[13]
13. The method according to any one of the preceding claims, characterized in that
- For the individual paths, the conditional transition probability is formed over a predetermined number of time windows, and
- The further time course of the conditional transition probabilities of the individual edges formed in this way, in particular after the respective time window or a predetermined number of time windows, is examined to determine whether they fall below a predetermined probability threshold value and in this case
- The determined paths are deleted from the graph and / or
- Individual nodes of the graph are assigned variable parts of the protocol lines instead of unchangeable parts.
[14]
14. The method according to any one of the preceding claims, characterized in the event that at a later point in time after the creation of the syntax tree there is a significantly high number of protocol lines which are not assigned to any of the directed paths of the syntax tree created by nodes and edges , a corresponding path, which characterizes the individual protocol lines, can be newly created in the syntax tree for these protocol lines by modification and, if necessary, the path probabilities assigned to the individual edges are adapted in this case to the newly occurring protocol lines.
[15]
15. The method according to any one of the preceding claims, characterized in that at least one further protocol line is created by the computer system or by processes running on it, and that a parser, which was created based on the syntax tree, is used to examine whether the further protocol line is also the syntax tree matches, where a missing match may be regarded as an indication of the existence of a different system state.
[16]
16. Data carrier on which a program for performing a method according to one of the preceding claims is stored.

类似技术:

公开号 | 公开日 | 专利标题

EP1330685B1|2012-12-12|Testing method and testing device for starting up systems which are controlled by means of a program logic

AT521665A1|2020-03-15|Grammar recognition

EP0533261A2|1993-03-24|Method and apparatus for recognizing uttered words in a speech signal

EP2800307B1|2016-06-08|Method for detecting deviations from a given standard state

DE102006055864A1|2008-05-29|Dialogue adaptation and dialogue system for implementation

EP3719651A1|2020-10-07|Method for characterizing the operating state of a computer system

WO2008025719A1|2008-03-06|Method for producing a size-optimized delta file

EP2302554A2|2011-03-30|Method for identifying a section of computer program contained in a computer storage system

DE102018128048A1|2020-05-14|Method and device for storing data and their relationships

EP1064606A1|2001-01-03|Data processing system and method for the automatic creation of a summary of text documents

DE10137297A1|2003-02-20|Method for automated testing of software or software component robustness in which the number of test cases to be run is reduced by not running test cases containing known critical test values

DE102020115344A1|2021-12-09|Method and device for the automatic investigation of the behavior of a technical facility

EP3531300A1|2019-08-28|Computer-implemented method for acquiring information

DE2613703C2|1985-11-28|Circuit arrangement for translating program texts

DE102018129138A1|2020-05-20|Method and system for determining a pair of table columns for linking

EP3668036A1|2020-06-17|Creation of a blockchain with blocks comprising an adjustable number of transaction blocks and multiple intermediate blocks

EP3531299A1|2019-08-28|Computer-implemented method for querying a search space

EP3764218A1|2021-01-13|Method for the computer-assisted interaction of an operator with a model of a technical system

EP3531302A1|2019-08-28|Computer-implemented method for searching for responses

EP3961447A1|2022-03-02|Method for detecting abnormal operating states of a computer system

WO1994007197A1|1994-03-31|Method of processing an application program on a parallel-computer system

DE19635902C1|1998-02-19|Process for the mechanical generation of an optimized knowledge base for a fuzzy logic processor

EP2682866B1|2020-02-26|Methods for the implementation of data formats

Polonski2008|Lernen von Protokollautomaten

EP0952534A1|1999-10-27|Method to automatically generate rules to classify images

同族专利:

公开号 | 公开日

EP3582443B1|2020-12-30|

AT521665B1|2021-05-15|

EP3582443A1|2019-12-18|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

EP2299650A1|2009-09-21|2011-03-23|Siemens Aktiengesellschaft|Method for recognising anomalies in a control network|

EP2800307A1|2013-04-29|2014-11-05|AIT Austrian Institute of Technology GmbH|Method for detecting deviations from a given standard state|

DE202013011084U1|2013-12-06|2014-03-19|University Of Southern California|Training for a text-to-text application that uses a string-tree transformation for training and decoding|

EP3267625A1|2016-07-07|2018-01-10|AIT Austrian Institute of Technology GmbH|Method for detection of abnormal conditions in a computer network|

US6199062B1|1998-11-19|2001-03-06|International Business Machines Corporation|Reverse string indexing in a relational database for wildcard searching|AT523829B1|2020-07-28|2021-12-15|Ait Austrian Inst Tech Gmbh|Method for detecting abnormal operating states of a computer system|

AT523948A1|2020-09-01|2022-01-15|Ait Austrian Inst Tech Gmbh|Method for detecting abnormal operating states of a computer system|

法律状态:

优先权:

申请号 | 申请日 | 专利标题

ATA50461/2018A|AT521665B1|2018-06-11|2018-06-11|Grammar recognition|ATA50461/2018A| AT521665B1|2018-06-11|2018-06-11|Grammar recognition|

EP19169705.1A| EP3582443B1|2018-06-11|2019-04-17|Grammar detection|

[返回顶部]